NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference

https://doi.org/10.3389/frai.2021.676564

Hawks, Benjamin; Duarte, Javier; Fraser, Nicholas J.; Pappalardo, Alessandro; Tran, Nhan; Umuroglu, Yaman (July 2021, Frontiers in Artificial Intelligence)
null (Ed.)
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning , and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.
more » « less
Full Text Available
GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments

https://doi.org/10.3389/fdata.2020.604083

Wang, Michael; Yang, Tingjun; Flechas, Maria Acosta; Harris, Philip; Hawks, Benjamin; Holzman, Burt; Knoepfel, Kyle; Krupa, Jeffrey; Pedro, Kevin; Tran, Nhan (January 2021, Frontiers in Big Data)
null (Ed.)
Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution.
more » « less
Full Text Available
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Fahim, Farah; Hawks, Benjamin; Herwig, Christian; Hirschauer, James; Jindariani, Serge; Nhan, Trần; Carloni, Luca; DiGuglielmo, Giuseppe; Harris, Phillip; Krupa, Jeffrey; et al (April 2021, ArXivorg)
null (Ed.)
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.
more » « less
Full Text Available

Search for: All records